Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems

نویسندگان

Jia Yu

Mohamed Sarwat

چکیده

Classic database indexes (e.g., B-Tree), though speed up queries, suffer from two main drawbacks: (1) An index usually yields 5% to 15% additional storage overhead which results in non-ignorable dollar cost in big data scenarios especially when deployed on modern storage devices. (2) Maintaining an index incurs high latency because the DBMS has to locate and update those index pages affected by the underlying table changes. This paper proposes Hippo a fast, yet scalable, database indexing approach. It significantly shrinks the index storage and mitigates maintenance overhead without compromising much on the query execution performance. Hippo stores disk page ranges instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. It maintains simplified histograms that represent the data distribution and adopts a page grouping technique that groups contiguous pages into page ranges based on the similarity of their index key attribute distributions. When a query is issued, Hippo leverages the page ranges and histogram-based page summaries to recognize those pages such that their tuples are guaranteed not to satisfy the query predicates and inspects the remaining pages. Experiments based on real and synthetic datasets show that Hippo occupies up to two orders of magnitude less storage space than that of the B-Tree while still achieving comparable query execution performance to that of the B-Tree for 0.1% 1% selectivity factors. Also, the experiments show that Hippo outperforms BRIN (Block Range Index) in executing queries with various selectivity factors. Furthermore, Hippo achieves up to three orders of magnitude less maintenance overhead and up to an order of magnitude higher throughput (for hybrid query/update workloads) than its counterparts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL - Lessons from the Road

In this paper, we present our experience in indexing the dropoff and pick-up locations of taxi trips in New York City. The paper presents a comprehensive experimental analysis of classic and state-ofthe-art spatial database indexing schemes. The paper evaluates a popular spatial tree indexing scheme (i.e., GIST-Spatial), a Block Range Index (BRIN-Spatial) provided by PostgreSQL as well as a new...

متن کامل

Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing

The constant flux of data and queries alike has been pushing the boundaries of data analysis systems. The increasing size of raw data files has made data loading an expensive operation that delays the data-to-insight time. Hence, recent in-situ query processing systems operate directly over raw data, alleviating the loading cost. At the same time, analytical workloads have increasing number of ...

متن کامل

A Superimposed Codeword Indexing Scheme for Handling Sets in Prolog Databases

While there has been growing interest in the use of Prolog for database applications, the size of these applications is limited by the capabilities of current Prolog systems for handling disk resident clauses. A major impediment is the inordinate amount of time required for retrieval and urujication of clauses from a large set stored on disk. Indexing is commonly used in conventional database s...

متن کامل

Lightweight Indexing for Log-Structured Key-Value Stores

The recent shift towards write-intensive workload on big data (e.g., financial trading, social user-generated data streams) has pushed the proliferation of log-structured key-value stores, represented by Google’s BigTable [1], Apache HBase [2] and Cassandra [3]. While providing key-based data access with a Put/Get interface, these key-value stores do not support valuebased access methods, which...

متن کامل

Use of Transforms for Indexing in Audio Databases

The phenomenal increases in the amounts of audio data being generated, processed, and used in several computer applications have necessitated the development of audio database systems with newer features such as content-based queries and similarity searches to manage and use such data. Fast and accurate retrievals for content-based queries are crucial for such systems to be useful. EEcient cont...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 10 شماره

صفحات -

تاریخ انتشار 2016

Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems

نویسندگان

چکیده

منابع مشابه

Indexing the Pickup and Drop-Off Locations of NYC Taxi Trips in PostgreSQL - Lessons from the Road

Slalom: Coasting Through Raw Data via Adaptive Partitioning and Indexing

A Superimposed Codeword Indexing Scheme for Handling Sets in Prolog Databases

Lightweight Indexing for Log-Structured Key-Value Stores

Use of Transforms for Indexing in Audio Databases

عنوان ژورنال:

اشتراک گذاری